The global smartphone market has experienced multiple market leaders over the past few years. The only two major brands that remained over the years were Apple and Samsung. Those two brands are compared regularly and are competing over the market share. While Samsung was the market leader in the first three quarters in 2021. In September 2021, Apple released its new iPhone model with the generation 13 and gained market share, taking over the market leader position. In February 2022, Samsung also released the new model in its Galaxy series. Those two phones are the flagships in the current smartphone market.
Source: IDC. (2022). Global smartphone market share from 4th quarter 2009 to 4th quarter 2021 (by vendor). Statista. Statista Inc.. Accessed: April 14, 2022.
The Samsung Galaxy S series is the high-end model produced and sold by Samsung, a South Korean multinational electronics company. The S series has 13 generations. Together with the Galaxy Z and Galaxy Note (discontinued) series, the S series serves as flagship models (Wikipedia, n.d.). Samsung S series uses the open source operating system Android (Samsung).
In February 2022 the model S22 was released in three variants: S22, S22 Plus (also: S22+) and S22 Ultra (Wikipedia, n.d.). It comes in the colours that are shown in the picture below, namely: phantom white, burgundy, phantom black and green (Samsung). Also the S22 Ultra variant comes with an so-called “S Pen” which lets the user write on their phone and can be recharged within the phone (Samsung).
Samsung advertises the S22 series with the following features:
"The phone that makes everyday epic
Nightography camera
A battery that lasts the day and beyond
Our fastest chip ever"
(Source: Samsung)
The Apple iPhone series is the only smartphone series produced and sold by Apple Inc, an US-American multinational technology company (Wikipedia).
The generation 13 was released in September 2021 in three variants: iPhone 13, iPhone 13 mini, iPhone 13 Pro, iPhone 13 Pro Max and iPhone SE. The regular variant and mini can be bought in the colours: green, pink, blue, midnight, starlight, red. The variants Pro and Pro Max are available in: alpine green, silver, gold, graphite, sierra blue. The iPhone SE is available in: midnight, starlight, red (Apple).
Apple advertises the iPhone Pro with the following features:
"A dramatically more powerful camera system.
A display so responsive, every interaction feels new again.
The world’s fastest smartphone chip.
Exceptional durability.
And a huge leap in battery life." (Samsung)
| Samsung S22 | Apple iPhone 13 | |
|---|---|---|
| Starting Price | $799 | $799 |
| Screen size | 6.1 inches (2340 x 1080) | 6.1 inches (2532 x 1170) |
| Refresh rate | 48Hz-120Hz adaptive | 60Hz |
| CPU | Snapdragon 8 Gen 1 (US); Exynos 2200 (K) | A15 Bionic |
| RAM | 8GB | 4GB (based on teardowns) |
| Storage | 128GB, 256GB | 128GB, 256GB, 512GB |
| Rear cameras | 50MP wide (f/1.8); 12MP ultrawide (f/2.2); 10MP telephoto (f/2.4) with 3x optical zoom | 12MP main (f/1.6), 12MP ultrawide (f/2.4) |
| Front camera | 10MP (f/2.2) | 12MP (f/2.2) |
| Battery size | 3,700 mAh | 3,227 mAh (based on teardowns) |
| Battery life (Hrs:Mins) | 7:51 | 10:33 |
| Charging speeds | 25W wired, 15W wireless | 20W wired; 15W wireless |
| Size | 5.7 x 2.8 x 0.3 inches | 5.8 x 2.8 x 0.3 inches |
| Weight | 5.9 ounces | 6.14 ounces |
| Colors | Black, white, green, pink gold | Black, white, blue, pink, red, green |
Source: https://www.tomsguide.com/face-off/samsung-galaxy-s22-vs-iphone-13
Suggestions to customers about the products based on other customers’ opinions
Effective decision making opinion to the customer
Feedback provided to the companies to improve their product and business
What features did customers liked most?
What are features that can be improved?
We decided to extract data from Twitter with the keywords “Samsung S22” and the hashtag “#samsungs22” from 31.03.2022 - 08.04.2022 as we do not have a premium license to get access to the full timeline of tweets on the TwitterAPI.
We excluded from our query the users ShopeeID, _arllee and all retweets for the keywords “Samsung S22” because there were multiple thousands of tweets about a competition to win a Samsung mobile phone which caused a lot of duplicated data.
As well, we identified nine users which were creating advertising spam and non-valuable tweets that we had to exclude: “whitestonedome”, “FromKorea5”, “dome_glass”, “Whitestone_DE”, “whitestone_UK”, “jp_whitestone”, “Whitestone__FR”, “WhitestoneJapan”, “WhitestoneEU”.
In total we could gather 3846 tweets.
Our initial data frame has 17 attributes. You can find the meaning of each of the attributes here on the Twitter Developer Platform.
In the following code we checked that the data.frame was correctly formatted:
We extracted data for the search terms “iPhone 13” excluding retweets and the hashtag “#iphone13” from the TwitterAPI. Similarly, to the Samsung S22 we had to exclude certain users which were advertising (“whitestonedome”, “FromKorea5”, “domeglassapple”) and competitions to win an iPhone when retweeting or copy-pasting a specific text (“Join the event to win an iPhone 13!”). Also, we had a high number of tweets which were randomly posting brand names, e.g. “rolex iphone” in one tweet.
In total we could gather 8369 tweets.
Our initial data frame has 17 attributes. You can find the meaning of each of the attributes here on the Twitter Developer Platform.
In the following code we checked that the data.frame was correctly formatted:
While there were only 3846 tweets from users about the Samsung S22, there were 2.17 times more tweets about the Apple iPhone 13 during the period considered.
On 3 out of 9 days Samsung S22 had more tweets than iPhone 13. On the other days Apple exceeded the number of tweets extremely. Surprisingly, the number of tweets and users per day for the Samsung S22 remains constant while there are only a few tweets about the iPhone 13 in the first 3 days and a high volume of tweets from 3nd April until 7th April 2022. There was not only more tweets but also a larger group of users posting about iPhone 13. We could not identify a specific reason for this phenomenon because the last official Apple event was in March 2022 and not in the time period we tested.
There is no similar indicator from Google Trends or other sources. The search terms iPhone 13, Samsung S22, Apple and Samsung are relatively constant over time as you can see from the image below (Screenshot from Google Trends). It can be noted that Apple is more likely to be looked up than Samsung as well as the iPhone 13 is more popular on Google Search than Samsung S22. This gives us a first impression that the iPhone 13 model receives in general more attention.
There are more tweets than users which indicates that some users usually post more than one tweet. We calculated a ratio for this: avg_tweets_per_user = tweets per day / users per day.
It is interesting though there were less conversation about Apple in the first three days, there were statistically more people involved, so it seems to be a conversation between a smaller group of users. We see the same for Samsung S22 between 4th April to 6th April 2022.
We decided to preprocess the data based on the following steps:
All text was converted to lower case, e.g. Hello to hello.
All contractions were converted to the longer form, e.g. don’t to do not
All common internet slang was converted to formal English, e.g. TGIF to Thanks God it is Friday
Hashtags (#) were removed
Word elongation was replaced to the usual word form, e.g. heeeeey to hey
All non-ASCII characters were replaced with equivalent or removed, © to (C)
White space within the string is reduced to one white space
White space at the start and end of the string was removed
“RT”, implicating that its a retweet was removed
all links were removed based on the start of “http”
all @username were removed
punctuation were removed
stop words were removed based on the RegEx approach
wordcloud(word_s, min.freq = 60, max.words = 40, random.order = FALSE, color= pal)
Samsung S22 users mentioned Samsung, Galaxy (which is the series of the S22) and Ultra (which is a specific model). The Ultra model is comparable to the iPhone 13 Pro.
Also iPhone, Pro, Max, Note & OnePlus are mentioned which are other comparable smartphones in the market.
Other words that are often mentioned belong to specifications that users talked about:
android, update, security
camera, pixel, video
features
mediatek
screen
amp
case
wordcloud(word_a, min.freq = 60, max.words = 40, random.order = FALSE, color= pal)
We can identify that Apple iPhone is mentioned in the variants Pro, Mini and Max. Pro seems to be the most important one, then Max and then the Mini.
As with the Samsung S22, also here it is mentioned with “Ultra”, “Galaxy” and “Samsung”.
The specifications of the phone that users talked about were:
pixel, camera
battery, amp
price
case
green (could mean colors?)
For iPhone 13, we can notice that some adjectives and verbs were mentioned a lot:
available, buy
win, free
still, now
good, better, best, like
will, get, want, need, can
We need to preprocess data this time additionally with the following steps:
Emojis were replaced by the word form, e.g. Smiling emoji to smiling
Emoji Identifier were replaced by the word form, e.g. :-) to smiling
stopwords_regex <- paste(stopwords('en'), collapse = '\\b|\\b')
stopwords_regex <- paste0('\\b', stopwords_regex, '\\b')
Samsung_df <- samsung_df$text %>%
str_to_lower() %>% #all text to lower case
replace_contraction() %>% #replaces contractions to longer form
replace_internet_slang() %>% #replaces common internet slang
replace_hash(replacement = "") %>% #removes hashtags
replace_word_elongation() %>% #removes word elongation, e.g. "heeeeey" to "hey"
replace_emoji() %>% #replaces emojis with the word form
replace_emoji_identifier() %>% #replaces emoji identifiers to word form
replace_non_ascii() %>% #replaces common non-ASCII characters.
str_squish() %>% #reduces repeated whitespace inside a string
str_trim() %>% #removes whitespace from start and end of string
{gsub("(RT|via)((?:\\b\\W*@\\w+)+)","",.)} %>% #remove RT (retweets)
{gsub("http[^[:blank:]]+","",.)} %>% #remove links that start with http
{gsub("@\\u+","",.)} %>% #remove names
{gsub('@\\w+', '', .)} %>% # remove at people
{gsub("[[:punct:]]"," ",.)} %>%#remove punctuation
{gsub("[^[:alnum:]]"," ",.)}%>%#remove punctuation
{gsub("pro"," ",.)}%>%#removes the word "pro" because its a different context herein
stringr::str_replace_all(stopwords_regex, '') %>% #remove stop words
unique()#remove duplicates
tail(Samsung_df)
Apple_df <- apple_df$text %>%
str_to_lower() %>% #all text to lower case
replace_contraction() %>% #replaces contractions to longer form
replace_internet_slang() %>% #replaces common internet slang
replace_hash(replacement = "") %>% #removes hashtags
replace_word_elongation() %>% #removes word elongation, e.g. "heeeeey" to "hey"
replace_emoji() %>% #replaces emojis with the word form
replace_emoji_identifier() %>% #replaces emoji identifiers to word form
replace_non_ascii() %>% #replaces common non-ASCII characters.
str_squish() %>% #reduces repeated whitespace inside a string
str_trim() %>% #removes whitespace from start and end of string
{gsub("(RT|via)((?:\\b\\W*@\\w+)+)","",.)} %>% #remove RT (retweets)
{gsub("http[^[:blank:]]+","",.)} %>% #remove links that start with http
{gsub("@\\u+","",.)} %>% #remove names
{gsub('@\\w+', '', .)} %>% # remove at people
{gsub("[[:punct:]]"," ",.)} %>%#remove punctuation
{gsub("[^[:alnum:]]"," ",.)}%>%#remove punctuation
{gsub("pro"," ",.)}%>%#removes the word "pro" because its a different context herein
stringr::str_replace_all(stopwords_regex, '') %>% #remove stop words
unique() #remove duplicates
tail(Apple_df)
The sentiment analysis
On the first look, sentiments are similar for both smartphone models and are commented on with highly positive sentiments. This means that on both models, users are generally speaking positively about their smartphone.
Samsung S22 has more trust and anticipation while iPhone 13 brings more joy to its users but also more sadness, surprise, fear and disgust. We can see from this that iPhone 13 users are in general more emotional - in a negative and positive direction - than Samsung S22 users.
sentimentterms_s <- attributes(extract_sentiment_terms(Samsung_df))$count %>%
mutate(weight_in_perc = round(polarity*n/sum(polarity*n)*100,digits=2)) %>%
arrange(desc(weight_in_perc)) %>%
filter(abs(polarity)>0.01)
head(sentimentterms_s,10)
##### Sentimentr score ######
sentimentr_samsung <- sentiment_by(Samsung_df, by=NULL)
# You can see the sentiment per tweet ID:
ggplot(data=sentimentr_samsung,aes(x=element_id,y=ave_sentiment, color=ave_sentiment))+
geom_line()
# You can see the summary of minimum, IQR, median and mean for all variables. For us, word_count and ave_sentiment are mostly interesting:
summary(sentimentr_samsung)
# You can see the variance and standard deviation for the Sentiment Score & Word Count:
data.frame(" "= c("Average", "Variance", "Standard Deviation"),
"Sentiment Score" = c(round(mean(sentimentr_samsung$ave_sentiment),2), round(var(sentimentr_samsung$ave_sentiment),2),
round(sd(sentimentr_samsung$ave_sentiment),2)),
"Word Count"=c(round(mean(sentimentr_samsung$word_count),2), round(var(sentimentr_samsung$word_count),2),
round(sd(sentimentr_samsung$word_count),2)))
Samsung S22 tweets were on average 10.4 words long while Apple iPhone 13 tweets had 9.9 words. The tweet with the maximum number of words was about iPhone 13 with 135 identified words. Samsung S22 with 34 words maximum words is quite shorter.
The sentimentr package in R estimates the sentiment polarity by sentence. The average sentiment for iPhone 13 was higher distributed with a range from -1.6 to +1.9 with its average at 0.26 while Samsung had a sentiment polarity from -1.1 to +1.6 with its average at 0.15. The higher variance within the data is identical to our findings from the plot “Relative Sentiment Score based on Tweets about Apple and Samsung” earlier. We assume that people are emotionally more dependent on their iPhone 13 than Samsung S22.
To see the sentiments per tweet, generate this HTML files:
Now we want to extract the sentiment terms for both phones. What is liked by the users? What is not?
tdm_s <- corpus<- Corpus(VectorSource(word_s))
tdm_s <- TermDocumentMatrix(tdm_s)
tdm_s2 <- removeSparseTerms(tdm_s, sparse = 0.95)
ms2 <- as.matrix(tdm_s2)
# cluster terms
distMatrix_s <- dist(scale(ms2))
fit_s <- hclust(distMatrix_s, method = "ward.D")
plot(fit_s)
rect.hclust(fit_s, k = 6,border = "red") # cut tree into 6 clusters
tdm_a <- corpus<- Corpus(VectorSource(word_a))
tdm_a <- TermDocumentMatrix(tdm_a)
tdm_a2 <- removeSparseTerms(tdm_a, sparse = 0.95)
ma2 <- as.matrix(tdm_a2)
# cluster terms
distMatrix_a <- dist(scale(ma2))
fit_a <- hclust(distMatrix_a, method = "ward.D")
plot(fit_a)
rect.hclust(fit_a, k = 6,border = "red") # cut tree into 6 clusters
$CTM
Topic 1 Topic 2 Topic 3 Topic 4 Topic 5 Topic 6 Topic 7 Topic 8 Topic 9 Topic 10
[1,] "case" "new" "case" "phone" "get" "phone" "just" "just" "phone" "phone"
[2,] "got" "just" "green" "get" "case" "get" "phone" "get" "new" "mini"
[3,] "new" "buy" "new" "can" "new" "case" "got" "phone" "amp" "can"
[4,] "one" "case" "like" "buy" "camera" "new" "now" "case" "mini" "want"
[5,] "get" "camera" "need" "green" "use" "use" "want" "camera" "like" "get"
[6,] "amp" "will" "will" "new" "green" "like" "will" "now" "case" "case"
[7,] "now" "want" "mini" "win" "got" "know" "buy" "got" "can" "like"
[8,] "want" "price" "buy" "want" "mini" "can" "green" "new" "get" "one"
[9,] "use" "phone" "now" "case" "know" "will" "like" "want" "camera" "use"
[10,] "like" "one" "win" "one" "buy" "make" "price" "amp" "give" "just"
$VEM
Topic 1 Topic 2 Topic 3 Topic 4 Topic 5 Topic 6 Topic 7 Topic 8 Topic 9 Topic 10
[1,] "get" "get" "case" "case" "case" "phone" "phone" "phone" "phone" "got"
[2,] "new" "new" "camera" "can" "get" "use" "camera" "case" "get" "can"
[3,] "phone" "now" "phone" "phone" "phone" "now" "new" "just" "can" "new"
[4,] "want" "like" "green" "got" "use" "will" "use" "can" "just" "green"
[5,] "buy" "buy" "like" "just" "new" "got" "can" "new" "want" "now"
[6,] "mini" "just" "new" "want" "want" "like" "one" "use" "case" "just"
[7,] "camera" "got" "buy" "new" "buy" "want" "get" "one" "one" "mini"
[8,] "plus" "mini" "will" "green" "will" "green" "mini" "like" "like" "amp"
[9,] "don" "phone" "one" "will" "one" "case" "don" "will" "got" "phone"
[10,] "got" "use" "mini" "now" "green" "look" "green" "buy" "need" "good"
$VEM_Fixed
Topic 1 Topic 2 Topic 3 Topic 4 Topic 5 Topic 6 Topic 7 Topic 8 Topic 9 Topic 10
[1,] "get" "get" "case" "case" "case" "phone" "phone" "phone" "phone" "got"
[2,] "new" "new" "camera" "can" "get" "use" "camera" "case" "get" "can"
[3,] "phone" "now" "phone" "phone" "phone" "now" "new" "just" "can" "new"
[4,] "want" "like" "green" "got" "use" "will" "use" "can" "just" "green"
[5,] "buy" "buy" "like" "just" "new" "got" "can" "new" "want" "now"
[6,] "mini" "just" "new" "want" "want" "like" "one" "use" "case" "just"
[7,] "camera" "got" "buy" "new" "buy" "want" "get" "one" "one" "mini"
[8,] "plus" "mini" "will" "green" "will" "green" "mini" "like" "like" "amp"
[9,] "don" "phone" "one" "will" "one" "case" "don" "will" "got" "phone"
[10,] "got" "use" "mini" "now" "green" "look" "green" "buy" "need" "good"
$Gibbs
Topic 1 Topic 2 Topic 3 Topic 4 Topic 5 Topic 6 Topic 7 Topic 8 Topic 9 Topic 10
[1,] "singl" "phone" "success" "eidi" "result" "oppo" "select" "pull" "crazi" "due"
[2,] "condit" "case" "annoy" "leav" "built" "forev" "stay" "\"look" "flat" "activ"
[3,] "level" "get" "finger" "busi" "clear" "owner" "bug" "sehri" "tho" "whatev"
[4,] "snapchat" "new" "futur" "crystal" "depend" "upload" "mini\"," "space" "faster" "alon"
[5,] "spot" "just" "green\"," "\"tech" "pixel" "wow" "realm" "\"io" "bet" "\"nah"
[6,] "\"upgrad" "use" "imag" "cuz" "trick" "announc" "star" "definit" "earli" "certifi"
[7,] "bike" "can" "till" "foldabl" "\"like" "wish" "station" "site" "sensor" "matter"
[8,] "choos" "buy" "bag" "glass" "cri" "bought" "armor" "telephoto" "shell" "newest"
[9,] "line" "want" "mutual" "plate" "forget" "gonna" "bag" "web" "either" "sick"
[10,] "tini" "green" "rang" "convinc" "hous" "gotten" "consid" "win" "provid" "steve"
The topics about “seri”, “now”, “camera”, “update” relates to the S22 Ultra’s April patch that introduced many camera-related features.
# Most common Positive and Negative words using Bing
iphone.reviews.text %>%
unnest_tokens(word, review_text) %>%
anti_join(stop_words) %>%
anti_join(ignore.words) %>%
inner_join(get_sentiments("bing")) %>%
dplyr::count(word, sentiment, sort = TRUE) %>%
filter(n > 2) %>%
mutate(word = reorder(word, n)) %>%
mutate(percent = round(n/sum(n), 3)) %>%
ggplot(aes(x = word, y = percent, fill = sentiment, label = percent)) +
geom_col(show.legend = FALSE) +
facet_wrap(~sentiment, scales = "free_y") +
geom_text(aes(y = 0.7*percent)) +
labs(title = "iPhone 13 Word Polarity (bing)") +
coord_flip() +
theme_bw() +
theme(plot.title = element_text(hjust = 0.5))
Joining, by = "word"
Joining, by = "word"
Joining, by = "word"
Joining, by = "word"
Joining, by = "word"
Joining, by = "word"
Joining, by = "word"
[1] 3.117557
Joining, by = "word"
Joining, by = "word"
[1] 3.117557
# Correlation Terms
# The correlation of appearing together in a review
samsung.correlation.terms <- samsung.reviews.text %>%
mutate(review = row_number()) %>%
unnest_tokens(word, review_text) %>%
filter(!word %in% stop_words$word) %>%
group_by(word) %>%
filter(n() >=7)%>%
pairwise_cor(word, review, sort = TRUE)
samsung.correlation.terms
library(ggraph)
library(igraph)
samsung.correlation.terms %>%
filter(correlation >= 0.50) %>%
graph_from_data_frame() %>%
ggraph(layout = "igraph", algorithm = "kk") +
geom_edge_link(aes(alpha = correlation),
show.legend = FALSE)+
geom_node_point(color = "lightblue", size = 2) +
geom_node_text(aes(label = name), repel = TRUE) +
theme_void()+
ggtitle("Correlation of terms in Samsung S22 Reviews")
# Correlation Terms
# The correlation of appearing together in a review
samsung.correlation.terms <- samsung.reviews.text %>%
mutate(review = row_number()) %>%
unnest_tokens(word, review_text) %>%
filter(!word %in% stop_words$word) %>%
group_by(word) %>%
filter(n() >=7)%>%
pairwise_cor(word, review, sort = TRUE)
samsung.correlation.terms
library(ggraph)
library(igraph)
samsung.correlation.terms %>%
filter(correlation >= 0.50) %>%
graph_from_data_frame() %>%
ggraph(layout = "igraph", algorithm = "kk") +
geom_edge_link(aes(alpha = correlation),
show.legend = FALSE)+
geom_node_point(color = "lightblue", size = 2) +
geom_node_text(aes(label = name), repel = TRUE) +
theme_void()+
ggtitle("Correlation of terms in Samsung S22 Reviews")
# Correlation Terms
# The correlation of appearing together in a review
samsung.correlation.terms <- samsung.reviews.text %>%
mutate(review = row_number()) %>%
unnest_tokens(word, review_text) %>%
filter(!word %in% stop_words$word) %>%
group_by(word) %>%
filter(n() >=7)%>%
pairwise_cor(word, review, sort = TRUE)
samsung.correlation.terms
library(ggraph)
library(igraph)
samsung.correlation.terms %>%
filter(correlation >= 0.50) %>%
graph_from_data_frame() %>%
ggraph(layout = "igraph", algorithm = "kk") +
geom_edge_link(aes(alpha = correlation),
show.legend = FALSE)+
geom_node_point(color = "lightblue", size = 2) +
geom_node_text(aes(label = name), repel = TRUE) +
theme_void()+
ggtitle("Correlation of terms in Samsung S22 Reviews")
# Correlation Terms
# The correlation of appearing together in a review
apple.correlation.terms <- iphone.reviews.text %>%
mutate(review = row_number()) %>%
unnest_tokens(word, review_text) %>%
filter(!word %in% stop_words$word) %>%
group_by(word) %>%
filter(n() >= 5)%>%
pairwise_cor(word, review, sort = TRUE)
apple.correlation.terms
library(ggraph)
library(igraph)
apple.correlation.terms %>%
filter(correlation >= 0.50) %>%
graph_from_data_frame() %>%
ggraph(layout = "igraph", algorithm = "kk") +
geom_edge_link(aes(alpha = correlation),
show.legend = FALSE)+
geom_node_point(color = "lightblue", size = 2) +
geom_node_text(aes(label = name), repel = TRUE) +
theme_void()+
ggtitle("Correlation of terms in Apple iPhone 13 Reviews")
bigrams.network.df_s <- samsung.reviews.text %>%
unnest_tokens(bigram, review_text, token = "ngrams", n = 2) %>%
separate(bigram, c("word1", "word2"), sep = " ") %>%
filter(!word1 %in% stop_words$word & !word2 %in% stop_words$word) %>%
dplyr::count(word1, word2, sort = TRUE) %>%
filter(n > 5)
bigrams.network_s <- graph_from_data_frame(bigrams.network.df_s)
bigrams.network_s
IGRAPH ec74b54 DN-- 12 8 --
+ attr: name (v/c), n (e/n)
+ edges from ec74b54 (vertex names):
[1] battery->life s22 ->ultra 13 ->pro iphone ->13 pro ->max screen ->protector galaxy ->s22
[8] samsung->galaxy
bigrams.network.df_a <-iphone.reviews.text %>%
unnest_tokens(bigram, review_text, token = "ngrams", n = 2) %>%
separate(bigram, c("word1", "word2"), sep = " ") %>%
filter(!word1 %in% stop_words$word & !word2 %in% stop_words$word) %>%
dplyr::count(word1, word2, sort = TRUE) %>%
filter(n > 2)
bigrams.network_a <- graph_from_data_frame(bigrams.network.df_a)
Warning in graph_from_data_frame(bigrams.network.df_a) :
In `d' `NA' elements were replaced with string "NA"
bigrams.network_a
IGRAPH fc5eca6 DN-- 92 75 --
+ attr: name (v/c), n (e/n)
+ edges from fc5eca6 (vertex names):
[1] iphone ->13 battery ->life NA ->NA iphone ->12 camera ->quality
[6] battery ->backup iphone ->11 android ->user pro ->max 13 ->pro
[11] amazing ->phone picture ->quality apple ->products cinematic ->mode iphone ->user
[16] nice ->product phone ->12 service ->centre time ->iphone amazing ->battery
[21] android ->phone buy ->iphone iphone ->7 original ->product phone ->13
[26] refresh ->rate sound ->quality amazing ->camera battery ->performance buy ->samsung
[31] camera ->battery camera ->lens heating ->issues iphone ->6s iphone ->8
[36] mind ->blowing nice ->phone night ->mode oneplus ->5t user ->experience
+ ... omitted several edges
# now we find the centrality measures of the network
# degree:the number of its adjacent edges (measure of direct influence)
deg_s <- degree(bigrams.network_s, mode = "all")
#K-core decomposition allows us to identify the core and the periphery of the network. A k-core is a maximal subnet of a network such that all nodes have at least degree K.
core_s <- coreness(bigrams.network_s, mode = "all")
# betweenness measures brokerage or gatekeeping potential. It is (approximately) the number of shortest paths between nodes that pass through a particular node.
betw_s <- betweenness(bigrams.network_s)
#Eigenvector centrality is a measure of being well-connected connected to the well-connected. First eigenvector of the graph adjacency matrix. Only works with undirected networks.
eigen_s <- eigen_centrality(bigrams.network_s, directed = TRUE)
Warning in eigen_centrality(bigrams.network_s, directed = TRUE) :
At centrality.c:333 :graph is directed and acyclic; eigenvector centralities will be zeros
members_s <- cluster_walktrap(bigrams.network_s)
library(igraph)
bigrams.network_s <- simplify(bigrams.network_s
#remove.multiple = FALSE, #error occured ?
#remove.loops = TRUE)
)
V(bigrams.network_s)$color <- members$membership+1
# now we find the centrality measures of the network
# degree:the number of its adjacent edges (measure of direct influence)
deg_s <- degree(bigrams.network_s, mode = "all")
#K-core decomposition allows us to identify the core and the periphery of the network. A k-core is a maximal subnet of a network such that all nodes have at least degree K.
core_s <- coreness(bigrams.network_s, mode = "all")
# betweenness measures brokerage or gatekeeping potential. It is (approximately) the number of shortest paths between nodes that pass through a particular node.
betw_s <- betweenness(bigrams.network_s)
#Eigenvector centrality is a measure of being well-connected connected to the well-connected. First eigenvector of the graph adjacency matrix. Only works with undirected networks.
eigen_s <- eigen_centrality(bigrams.network_s, directed = TRUE)
Warning in eigen_centrality(bigrams.network_s, directed = TRUE) :
At centrality.c:333 :graph is directed and acyclic; eigenvector centralities will be zeros
members_s <- cluster_walktrap(bigrams.network_s)
library(igraph)
bigrams.network_s <- simplify(bigrams.network_s
#remove.multiple = FALSE, #error occured ?
#remove.loops = TRUE)
)
V(bigrams.network_s)$color <- members_s$membership+1
# Using "Coreness" as size
# Coreness -> mean (average distance to all the other nodes, diffusion of information)
plot(bigrams.network_s,
layout = layout_with_fr,
vertex.label.color = "black",
vertex.label.cex = 0.9,
vertex.label.dist = 0,
vertex.frame.color = 0,
vertex.size = core_s*10,
edge.arrow.size = 0.01,
edge.curved = 0.7,
edge.color = "gray",
main = "Bigram Communities (Samsung)"
)
mtext("Coreness")
# Using "Degree" as size
# degree=mode (number of edges of the node, in-degree:prestige
plot(bigrams.network_s,
layout = layout_with_fr,
vertex.label.color = "black",
vertex.label.cex = 0.9,
vertex.label.dist = 0,
vertex.frame.color = 0,
vertex.size = deg_s,
edge.arrow.size = 0.01,
edge.curved = 0.7,
edge.color = "gray",
main = "Bigram Communities (Samsung)"
)
mtext("Degree")
# Using "Eigenvector Centrality" as size
# centrality (the most connected words)
plot(bigrams.network_s,
layout = layout_with_fr,
vertex.label.color = "black",
vertex.label.cex = 0.8,
vertex.label.dist = 0,
vertex.size = eigen_s$vector*20,
edge.arrow.size = 0.01,
edge.curved = 0.7,
edge.color = "black",
main = "Bigram Communities (Samsung)"
)
mtext("Eigenvector Centrality")
# Using "Betweenness" as size
#Betweenness -> median (weighted # of paths going through the node)
plot(bigrams.network_s,
layout = layout_with_fr,
vertex.label.color = "black",
vertex.label.cex = 0.8,
vertex.label.dist = 0,
vertex.size = betw_s,
edge.arrow.size = 0.01,
edge.curved = 0.7,
edge.color = "lightgrey",
main = "Bigram Communities (Samsung)"
)
mtext("Betweenness")
# now we find the centrality measures of the network
# degree:the number of its adjacent edges (measure of direct influence)
deg_a <- degree(bigrams.network_a, mode = "all")
#K-core decomposition allows us to identify the core and the periphery of the network. A k-core is a maximal subnet of a network such that all nodes have at least degree K.
core_a <- coreness(bigrams.network_a, mode = "all")
# betweenness measures brokerage or gatekeeping potential. It is (approximately) the number of shortest paths between nodes that pass through a particular node.
betw_a <- betweenness(bigrams.network_a)
#Eigenvector centrality is a measure of being well-connected connected to the well-connected. First eigenvector of the graph adjacency matrix. Only works with undirected networks.
eigen_a <- eigen_centrality(bigrams.network_a, directed = TRUE)
members_a <- cluster_walktrap(bigrams.network_a)
library(igraph)
bigrams.network_a <- simplify(bigrams.network_a
#remove.multiple = FALSE, #error occured ?
#remove.loops = TRUE)
)
V(bigrams.network_a)$color <- members_a$membership+1
# Using "Coreness" as size
# Coreness -> mean (average distance to all the other nodes, diffusion of information)
plot(bigrams.network_a,
layout = layout_with_fr,
vertex.label.color = "black",
vertex.label.cex = 0.9,
vertex.label.dist = 0,
vertex.frame.color = 0,
vertex.size = core_a*10,
edge.arrow.size = 0.01,
edge.curved = 0.7,
edge.color = "gray",
main = "Bigram Communities (iPhone 13)"
)
mtext("Coreness")
# Using "Degree" as size
# degree=mode (number of edges of the node, in-degree:prestige
plot(bigrams.network_a,
layout = layout_with_fr,
vertex.label.color = "black",
vertex.label.cex = 0.9,
vertex.label.dist = 0,
vertex.frame.color = 0,
vertex.size = deg_a,
edge.arrow.size = 0.01,
edge.curved = 0.7,
edge.color = "gray",
main = "Bigram Communities (iPhone 13)"
)
mtext("Degree")
# Using "Eigenvector Centrality" as size
# centrality (the most connected words)
plot(bigrams.network_a,
layout = layout_with_fr,
vertex.label.color = "black",
vertex.label.cex = 0.8,
vertex.label.dist = 0,
vertex.size = eigen_a$vector*20,
edge.arrow.size = 0.01,
edge.curved = 0.7,
edge.color = "black",
main = "Bigram Communities (iPhone 13)"
)
mtext("Eigenvector Centrality")
# Using "Betweenness" as size
#Betweenness -> median (weighted # of paths going through the node)
plot(bigrams.network_a,
layout = layout_with_fr,
vertex.label.color = "black",
vertex.label.cex = 0.8,
vertex.label.dist = 0,
vertex.size = betw_a,
edge.arrow.size = 0.01,
edge.curved = 0.7,
edge.color = "lightgrey",
main = "Bigram Communities (iPhone 13)"
)
mtext("Betweenness")